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Abstract 

The disease dengue has created panic in the minds of men and women of this 
time. Now a day the menacing of dengue has spread from town areas to rural 
areas. It affects heavily works on body organs and leads to the final state of 
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K rds: ; 
_ words death. It works for some years on human organs even after coming round from 
Pennie it. It exists in the human body.This disease is not confined now in the congested 
Biesble: town area only, but it has broken out in full swing in the rural area. We aim is to 


Random Forest; identify the factors which are the causes of the origin of dengue and its spread 

SVM; over society at such a large scale. It is also our aim to find the areas of soci- 

RUC AUC ety; on which consistent endeavor will help to confine in or diminish its effect 
in the O- level. Information is collected on at random survey basis, especially 
from peoples of dengue affected area by Questionnaire Method. Intelligence 
is also gathered from hospital and Internet to collect data which help to indi- 
cate factors performed heavily in which situation of Society. We reached the 
conclusion by experiment worked in the past- information and present data. 


1. Introduction Maharashtra, Tamil Nadu of India is very high, and 
is also a critical reason to spread the dengue in the 
human community. DF is not seasonal epidemic 
fever, but its prevalence increases in the rainy sea- 
son, because this season is advantageous for fertil- 


ization of the Aedes mosquito, which is spreading 


In this time, Dengue Disease is most dangerous and 
creating panic in the human society. The wideness 
of this disease not only in India, but it propagated in 
developing people is being affected in India whereas 


2.5 billion people are involved throughout the world. 
As per reports of WHO, approx. 75% of people 
who are affected by dengue fever (DF), are belong- 
ing in the South-East region and the western pacific 
region. The most significant burdens of economic, 
social, health, and illiteracy, are the main reason for 
spreading out this disease. The population density 
of the few states like West Bengal, Uttar Pradesh, 
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the dengue virus. This virus is related to the family 
of Flaviviridae, and it has four serotypes which are 
DENV1, DENV2, DENV3, and DENV4, are found 
in infected female Aedes mosquito. These species 
is found from 35 degrees north to 35 degrees south 
latitude below an altitude 1000 meters. 


Four types of Dengue infection, i.e., DHFI; 
DHFI; DHFII; DHFIV, are observed. Another 


Bhattacharya et al. 


observation is to track out the day of decrement 
of fever is called day-O. The day-O date is the 
ticklish date of dengue affected patients wherein 
our research area, authors are trying to find out 
the symptoms after and before affecting dengue 
patients. After recovery of dengue, it is also 
observed, that the organs are damaged in the human 
body of the patients. All this data has accumu- 
lated in all corners of the state of west Bengal. The 
researchers have tested data DF, which has an 87% 
accuracy level. 


Data Mining (DM) is a penetrating the compu- 
tational trial of big data set by using an amalga- 
mation of statistical analysis, database technology, 
and Machine learning with the objective of detect- 
ing the trends for getting results of research of vari- 
ous fields. 


The authors have dedicated to applying differ- 
ent types of algorithms concerned with classifica- 
tion, clustering as well as prediction. Originators 
have used various kinds of methods like Random 
Forest (RF), Support Vector Machine (SVM), Deci- 
sion Tree (DT), Logistics Regression (LR), Naive 
Bayes (NB) and computed the result using statistical 
method Cohen’s Kappa and ROC_AUC to maintain 
the level of accuracy; in this research. In this paper, 
the writers examined the raised frame of observa- 
tion on databases anticipated to epidemic diseases 
like dengue. 


2. Review Literature 


According to previous studies, symptoms like rash, 
joint pains, metallic taste, headache, vomiting, in 
people effected by dengue can be seen within two 
weeks. Patients of dengue die mostly because the 
disease is not diagnosed quickly. Researchers have 
been known to use a dataset that has 58 listed ail- 
ments or diseases. This was designed following 
the mechanics of Bayes Server (BS), a technique of 
machine learning that is able to detect dengue hem- 
orrhagic fever (DHF); by understanding the symp- 
toms that can have any kind of relations to fever and 
the like. A 99.84% accurate data was found out after 
this result analysis. 


The research work of (Fathima and 
Manimegalai), show how SVM (support vector 
machines) and data mining computational analytics 
can be used to track down this arboviral disease — 
of dengue. It could also be used to identify rules 
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and patterns for decision-making about the work 
of the future, implementing the sets of data. When 
such a result gives the best accuracy, it becomes 
quite satisfactory. However, since there are more 
parameters, it takes a lot of time to process this. 
The time it needs to calculate is more. There are 
multiple sources from which the data is brought 
together. However, the dynamic process is what 
processes the data. 


Efficient analytical methodology is extremely 
necessary for tracking out new and essential infor- 
mation in health- related data for the health industry, 
health insurance to check fraud, availability of med- 
ical solution to the patients at a lower cost, detection 
of cases of diseases, identify effective medical treat- 
ment methods, and efficient health care policies. By 
data mining technique, it is adequate to analyse fac- 
tors responsible for diseases such as food, various 
working environments, the educational level along 
with the living condition, available ability of fresh- 
water, health care services, cultural, environmental, 
and agricultural factors. The authors warned against 
giving guidelines for using the data mining tech- 
nique. There is a need to recognize the redundant 
and inappropriate attribute, because these may act 
as a noise and outlier to turn to slow the processing 
task. These attributes may play an unfavorable effect 
on the perfection of the classifier, and Statistical 
methods will be useful to recognize these attributes. 
No single classifier can produce the best result for 
every data set. This data set consists of training and 
testing. The performance of a classifier is judged 
using the testing data set. But sometimes, a testing 
data set may be easy and sometimes complicated. To 
avoid this problem, cross validation may give good 
performance in both training and testing. A hier- 
archical clustering technique is used where there is 
less information. Besides Dendrograms, partitioned 
algorithm is analyzed for overcoming the shortcom- 
ings of clustering. Association is useful for iden- 
tifying relationships among various attributes. An 
insignificant association is removed by experts. 


Previous studies have identified the environmen- 
tal conditions conducive for the outbreak of Dengue 
fever, to trace the spatial variations of the disease 
in different parts of Kolkata, to identify the socio- 
economic grounds behind the amplitude of the dis- 
ease in slum areas of Kolkata, to know about the 
variations of the outbreak of the disease among peo- 
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ple based on their housing conditions, to assess 
the role of government and NGO; in regulating the 
wideness of the illness, to understand the level of 
awareness among ordinary people about the danger 
of mosquito bite which leads to dengue fever. In 
doing so, the researcher has assumed the adjacent 
areas of Kolkata especially parts of Howrah, North 
& South 24 Parganas, three administrative decisions 
of West Bengal comparing the incidents of Dengue 
fever with the rest of India in respect of spread and 
cases of fatality rate. According to the researcher, 
a more careful approach must be taken to combat 
the disease, community participation is required in 
urban and rural areas, and awareness building has to 
be more intensive to check the mosquito generation. 


Author (Hasan et al.), have described the dengue 
virus as a virus of the group Flavivirus of the species 
Flavivviridae, which includes four types of dengue 
fever- DEN1, DEN2, DEN3, and DEN4. Accord- 
ing to them, dengue is a sickness of tropical and 
subtropics countries. The dengue upsurge is for the 
development of population growth rate, unplanned 
urbanization, inadequate mosquito control, numer- 
ous air travel, and scarcity of health awareness facil- 
ities. Dengue gets into more than 100 countries 
including, Europe &USA. The authors are of the 
view that in 1780 the first virologically certified epi- 
demic dengue come in Calcutta and Eastern India 
between 1963-1964. Dengue fever is a flue like 
an infectious disease, attacks persons of all ages, 
and it occurs chiefly during the rainy season. It is 
spread by Aedes mosquito bite. Dengue virus infec- 
tion gives identified clinical response. So, its accu- 
rate diagnosis is very difficult before clinical test. 
Antivirus of dengue is not discovered; physicians 
of the prescribed analgesic medicine as supportive 
care, fluid intake and sufficient bed rest. 


3. Methodology 


The data mining technique is used to quiddity for 
necessary intelligence from clinical data to take 
measure evidence for the medical decisions, mak- 
ing symptoms for dengue patients. The data and 
including attributes are playing a critical role to in 
achieving the success in any data mining research. 
In this study, LR; RF; DT; SVM; NB; the classifica- 
tion model are used based on the confusion matrix to 
establish the association between real attributes and 
predicted class attributes. And classification model 
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is mainly used for calculation the right and wrong 
classification for individual possible vale of vari- 
ables. Specially, ROC_AUC, and Cohen’s Kappa 
statistical methods are used to get the predicted 
result. Three performance measurements, 1.e., accu- 
racy, sensitivity, and specificity, are used in the study 
of research. 


3.1. Support Vector Machine (SVM) 


SVM is one of the classification models was 
invented by Vapnik et al. in 1998. To handle soft- 
margin SVM problems (non- linear), the researcher 
may try to get the maximum margin hyperplane to 
measure a robust separator for the puzzle classes. 

Author (Chang), studied 100 samples of dengue 
patients, and with the support of the SVM algo- 
rithm, the ranking of the weight of dengue symp- 
toms were found, and it helped to detect the highest 
weight parameters from 24 number of support vec- 
tors. Author (Gomes and Lisa), applied the SVM 
algorithm from gene expression data, they analyzed 
12 genes of 28 dengue patients during severe viral 
infection. Author (I Nordin), used the Kernel func- 
tion of SVM to handle instances of the relationship 
between the dependent, and independent variables 
for achieving better performance to predict dengue 
cases. 

The equation of a linear SVM can be written as 

g(x)= i, Bi pi git +00 

where, qi is the instant with label pi, ( is Lagrange 
multiplier and a0 is bias 

The equation for Kernel SVM as 

g(x)= DYjy Bi pi K(qi. g).q +00 
where, n denotes the number of support vectors and 
k(qi. q) is the kernel function 


3.2. Decision tree (DT) 


DT; model is applied in data mining to get con- 
clusions from observation of the data set. In this 
model, the supervised classification tree has leaves, 
and branches represent class levels and conjunctions 
feathers. A decision and decision-making both are 
explicitly used in the decision tree. Two types of the 
decision trees are used in data mining. One is the 
Classification Tree, and another is the Regression 
Tree. In Classification Tree analysis, the vitiated 
result is the class from the data set, and in Regres- 
sion analysis, vitiated result may be deliberated a 
real number. The most advantages of the DT are: 

e The result of the observation may be bloomed 
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graphically. 

e To handle numerical and categorical transferred 
to 0-1 values. 

e No need to normalize the data, only small data 
preparation. 

e While the box model can be used in the DT, 
the explanation for conducting is easy and smoothly 
explained by Boolean logic rather than black-box 
model, where the result is difficult to understand for 
the explanation. 

e Works with extensive data set. 

(Rosid et al.), conducted testing on the data, using 
the DT; method ID3 algorithm, in terms of symp- 
toms that affect DHE; and achieved an accuracy 
value above 82%. 


3.3. Logistics Regression (LR) 


LR; is a statistical method which has two possibil- 
ities may be true or false, i.e., 0 or 1.LR model 
has two categories, one is multinomial logistics has 
more than two outputs, and other is ordinal LR; the 
output depends on its input, not depend on the sta- 
tistical classification. 

In Regression Analysis, LR; is a continuous and 
categorical variable for prediction. This will be 
taken from the Bermouli trail that is the case of bino- 
mial on the dependent variable of a Bermoulli trail. 

In the LR; method, logistic function which is 
cumulative distribution function of logistic distrib- 
utors is used to measure the probability to maintain 
the relationship between categorical dependent vari- 
able or one, and more than one independent vari- 
able. In a previous study, LR; analysis was used to 
detect symptoms, physical signs that classified the 
DF; from fever related infection within first two days 
of affection with 74% sensitivity and 79% speci- 
ficity. In their work (Chien et al.) , used of LR; to 
determine valid predictor variables of DF; when the 
probability of Type 1 error was less than 0.05. 


3.4. Naive Bayes (NB) 


NB; model is created based on the Bayes theorem, 
In NB model, conditional probability is used, and it 
embarks posterior class probability for each instance 
in the data set, by using Bayes theorem 


_ v(-)r(Ai) 
p(Aiaj=—a Sisesere ane’ 


Where A and a are events, P denotes probability, 
and P(a)!=0 
By definition, we may be used Chain rule in con- 
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ditional probability for posterior class 


p(ai,a2,... .An/Ai)=p(ai/A;)p(aa,. - - -sn/Ag) 
Boeten (4) 
By repeated: 


p(aj1,ag,. ae ,a,/A;)=p(a;/A;)p(a2/A;,a;). oe p(a,/A;.a1,a2,. -_ 


Bei = 11) xenaee (5) 

And Conditional independence may be (assump- 
tion) 

p(a1,a2,...,4,/A;)=p(ai/A;)p(a2/A,). . . p(an/Ai)= 

3. (6) 

So, posterior class probability (according to NB 
classifier) is 

p(a1,a2,....an =z PAD], p(ag/At) on... (7) 

Where k=p(aj,a2,....,an). 

Author (Caicedo-Torres, William, and Pinz6én), 
used Gaussian Priors from NB model were imple- 
mented to find each feature mean and estimated 
Variance. Previous studies have used Gaus- 
sian Priors from NB model were implemented to 
find each feature mean and estimated Variance. 
Author (Arafiyah, Ria, and Hermin) , have taken 
input data of fever, processing of bleeding, spotting, 
tourniquet test; they used the NB; model to predict 
whether or not affected dengue. The performance 
of the classification NB; algorithm, using ROC; the 
prediction accuracy is 69%. 


3.5. Random Forest (RF) 


Random decision forest for Classification and 
Regression is investigated machine learning algo- 
rithm was first raised by Ho in 1995. The first con- 
ceptual paper was made on Random Forest by Leo 
Breiman in 2001. The most popular complex clas- 
sification technique where supervised of more clas- 
sifiers can only increase to certain levels of accu- 
racy, and alleviates errors. Trees increase low bias 
to very high Variance; both in RF; are of com- 
mon multiple dense decision trees on various parts 
of the dataset with reducing Variance. The sim- 
ple bootstrap aggregating methods can be used for 
RF; because without increasing the bias, it decreases 
the inconsistency of the model. RF; are nonpara- 
metric, and it can handle categorical, and multi- 
model data which are maybe ordinal or non-ordinal. 
Authors have previously applied RF in combination 
with ANN for evaluation of dengue model perfor- 
mances (Silitonga and Permatasari) . They proved, 
based on the result system, from their data set of 
patient’s medical records, the algorithm RF; with 
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classification accuracy is much better than where 
they have used measure the performance difference 
and got AUC_RF accuracy rate is in between 0.80 
— 0.90. Based on the result of accurate DHF; pre- 
diction system for avoiding the error of diagnosing 
DHF. 


3.6. AdaBoost 


AdaBoost stands for “Adaptive Boost’. After select- 
ing the training subset for accurate prediction of 
the last training the algorithm repetitively trains the 
AdaBoost model and it will be continuing for the 
strong probability of classification from the second 
in order of repetition or iteration, it gives higher 
gravity to wrong classified supervision. This process 
will continue unless and until training data assemble 
without any error. 

W(x)= Sign (D7; a7 Wi (@)) (8) 

Where, w(x) is the weight of training data, w,is 
the weight of training data ,wi(x) refers to the out- 
put of weak classifier, I for input x and ai denotes 
weight operation to the classifier 

a; =0.5* log( +4 

where, E denotes error rate 


layer. Then, the weights are normalized by divid- 
ing each of them by the sum of all weights pi, and 
Yt is the y level of training point (Zt, Yt) 


3.7. Cohen’s Kappa (CK) 


CK; is a statistical procedure to measure of the reli- 
ability of two raters give the same rating. The relia- 
bility of raters depends on the number of agreement 
scores. According to Kappa statics, CK, K has mea- 
sured the agreement between categorical variables x 
and y. 

If the value is 

1. 0 -> agreement to chance 

2. 0.10 — 0.20 slight agreement 

3. 0.21 — 0.40 fair agreement 
4. 0.41 — 0.60 moderate agreement 
5. 0.61 — 0.80 substantial agreement 
6. 0.81 — 0.99 near perfect 
7. 1 perfect 

To calculate K, authors have used SPSS software. 
Formulation for Cohen’s Kappa, 

K= Aj — A, / (1-Ag), where probability of agree- 
ment Ay = (Number in agreement / Total) 

And, Ae = A(correct) + A(incorrect) 
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A (correct) = (P+ Q/P4+Q+R+S)*(P+R/P 
+Q+R+S) 

A (incorrect) = (R+S/P+Q+R+S)*(Q4+S/ 
P+Q+R+4+S) 

Where, P The total number of raters is cor- 
rect. Theraters are in agreement. 

Q The total number of rater 1 is incorrect, but rater 
2 said are correct, this has disagreement. 

R The total number of rater 2 is incorrect, but rater 
1 said are correct, this has disagreement. 

S The total number of both raters are incorrect; 
this is agreement. 

get interpret results of rates, authors used N X N 
grid. 


3.8. Receiver Operating Curve and Compute Area 
Under (ROC_AUC) 


ROC; stands for Receiver Operating Curve, and 
AUC stands for Compute Area Under, as it com- 
pares of two operating characteristics, True Positive 
Rate (TPR), and False Positive Rate (FPR), that’s 
why ROC; is also known as the relative operating 
characteristic curve. ROC; curve has constructed by 
marking the TPR; which is also called sensitivity; 
or in machine learning, it is known as the probabil- 
ity of detection, against the FPR; which is treated as 
probability of false alarm, and it has computed as(1- 
Specificity). For ROC, the AUC must be deliberated 
using roc_auc_score()function. Both the roc_curve 
and the AUC function take true outcomes, i.e. (0, 
1), and enumerated the class 1. 

Sensitivity or hit rate or TPR = Tp / P= Tp/ (Tp 
+ Fn) = 1 -FNR 

Specificity or Selectivity or True Negative Rate 
(TNR) = Tn/n=Tn/ (Tn + Fp) = 1 —- FPR 

Precision or Positive Predicted Value (PPV) = Tp 
/ (Tp + Fp) = 1 — FDR, 

Negative Predicted Value (NPV) = Tn / (Tn + Fn), 

False Negative Rate (FNR) or Miss Rate = Fn / p 
=Fn/(Fn+Tp)=1-—TPR 

FPR = Fp/n = Fp / (Fp + Tn) = 1 — TNR (True 
Negative Rate) 

False Discovery Rate (FDR) = Fp / (Fp + Tp) = 1 
— PPV 

False Omission Rate (FOR) = Fn / (Fn + Tn) = 1 
- NPV 

Accuracy (ACC) = (Tp + Tn) / (p +n ) = (Tp + 
Tn) / (Tp + Fp + Tn + Fn) 

F1 Score is known as harmonic mean of precision 
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TABLE 1. Calculationof Cohen’s Kappa 


Rater-2 
Correct 
P 
R 
Correct 
Rater-1 


and sensitivity, 

So Fl = 2(PPV.TPR) / (PPV + TPR) = 2Tp/( 2Tp 
+ Fp + Fn) 

Where Tp = True Positives, Tn = True Negatives, 
Fp = False Positives, Fn = False negatives. 
4. Data sets 


The data in samples of dengue patients of all over 
the West Bengal, were collected not only from sev- 
eral Hospitals but also interacted with the people 
individually were suffered from DF by using ques- 
tionnaires method. The data containing the patient’s 
information was diagnosed by the researcher from 
the year 2016 to 2019. 


4.1. Questionnaires for Survey 
¢ Patient details: 


1. Name of the Patient: 
2. Name of Village / Town / District: 


3. Mention the residence is under Panchayat Area 
or Municipality area: 


4. Age: 

5. Gender: M/F 

6. No of educated people in family: 
7. Occupation: 


8. What precautions you have taken against 
mosquitoes bite? 


e What are the symptoms before detecting 
Dengue? 


¢ How much temperature level is increased? 
¢ How many days stayed Fever? 


e What are the Symptoms 
Dengue? 


after detecting 


Incorrect 


Incorrect 


Observation of Plate late counting 


Mention Blood pressure is fluctuating or not 
(mention BP): 


Whether the Bleeding happened? 


What are the symptoms getting after recovery 
from Dengue? 


¢ Have you been admitted in the Hospital / Nurs- 
ing home? 


¢ What are the medical treatments, tests, etc. 
have taken in the hospital? 


¢ Do you have any effect on any other organ after 
Dengue? 


What are the key reasons for effecting Dengue? 


During the study, the clinical statement was 
recorded from the patient at various stages, and the 
data consists of 91 patients and 13 fields. All this 
collected raw data does not consider in the experi- 
ments. These fields have carried a significant role 
in the study. Three area — Symptoms after and 
before the detection of DF and third is, what body 
organ/organs are affected in the human body of the 
patient, after recovery from DF. 

Authors have taken 13 fields name are: age, 
symptoms before detecting dengue, test, temper- 
ature level, fever stayed in days, symptoms after 
detecting dengue, plate late counting, average bp, 
bleeding happened or not, symptoms after recovery 
of dengue, hospitalized or not, affect other organs 
after recovery. 


5. Results 


Number of True cases: 41 (45.05%) 
Number of False cases: 50 (54.95%) 
12% in training set 
88% in test set 
Original True : 41 (0.00%) 
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Original False : 50 (0.00%) 
Training True : 30 (0.00%) 
Training False : 42 (0.00%) 
Test True : 11 (0.00%) 
Test False : 8 (0.00%) 


5.1. Creating new classifier with ensemble 


After testing with ensemble of LR, RE, NB, DT indi- 
vidually using AdaBoost model we get the given 
below result. 

From the report, it is observed, that highest accu- 
racy level is reached for using RF with AdaBoost 
machine learning method. We proposed to call this 
new combination Ensemble Random Forest (ERF) 
Model. This is our newly designed classifier which 
appears to be most effective in this particular appli- 
cation. The summarized report using ERF is given 
below. 


5.2. Summarized report using ERF Model 


Accuracy=8 1.556 
Standard deviation=0. 12374805 148734092 
Sensitivity=82.889 
Precision=84.606 
F Score=83.739 
ROC_AUC:87.15 
Kappa Score:68.908 


ROC curve Ensemble for Dengue 


oO a 
oo o 


True Positive Rate (Sensitivity) 


0% 0 0.2 04 0.6 0.8 
False Positive Rate (1 - Specificity) 


FIGURE 1. ROC curve ensemble for Dengue 
(Output 1) 


5.3. Ensemble New Model 


Accuracy=87.0 
Standard deviation=0.1342606641050383 
Sensitivity=78.111 
Precision=83.74 1 
F Score=80. 828 
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ROC_AUC:89.25 
Kappa Score:66.986 


ROC curve Ensemble for Dengue 


Oo o Oo o = 
Ls) = a ao o 


True Positive Rate (Sensitivity) 


Oo 
co? 
o 


0.2 04 06 08 1.0 
False Positive Rate (1 - Specificity) 


FIGURE 2. ROC curve ensemble for Dengue 
(Output 2) 


The Author (Rohan and Islam) used the same 
ensemble of RF with AdaBoost was used for detect- 
ing of Brest Cancer. They analyzed on 699 
instances, where 458 of benign data, 241 of malig- 
nant data, 11 features, and 10 attributes. In the test- 
ing phase; structured provided 98.5714% of accu- 
racy, Sensitivity 100%, and specificity 96.296%. 

The introduced model performs better than con- 
ventional RF classifier. 


6. Discussions of Results: 


We have analyzed 91 numbers of patient’s informa- 
tion where 79.12% training set and 20.88% test set, 
and at last, got the results from a different aspect of 
using several models are summarized. 

From the data set classified by LR; and Confusion 
matrix, accuracy is 0.4737. Using Cohen’s Kappa 
Score is 0.020619 and ROC_AUC: 0.511364 

Using RF; from Confusion matrix, the Train- 
ing accuracy is 0.9722, and the testing accuracy 
is 0.7368, Cohen’s Kappa Score is 0.486486 and 
ROC_AUC: 0.755682 

Training accuracy, and the Testing accuracy for 
the NB; model are 0.4722 and 0.6316, respec- 
tively. The Score of Cohen’s Kappa is 0.14195 and 
ROC_AUC is 0.562500 

For DT; Training accuracy is 1.0, Testing accu- 
racy is 0.8421 and Cohen’s Kappa: 0.681564 and 
ROC_AUC: 0.84 

For SVM from confusion matrix, Training accu- 
racy is 0.7083 and Testing accuracy is 0.3684, 
Cohen’s Kappa = 0.40 and ROC_AUC = 0.420435 
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TABLE 2. Classification report on the basis of precision 


LR RF NB DT SVM 
Micro Average 0.47 0.74 0.63 0.84 0.37 
Macro Average 0.51 0.76 0.81 0.84 0.35 
Weighted Average 0.53 0.77 0.77 0.85 0.35 


TABLE 3. Classification report on the basis of recall 


LR RF NB DT SVM 
Micro Average 0.47 0.74 0.63 0.84 0.37 
Macro Average 0.51 0.76 0.56 0.85 0.42 
Weighted Average 0.47 0.74 0.63 0.84 0.37 


TABLE 4. Classification report on the basis of f1 score 


LR RF NB DT SVM 
Micro Average 0.47 0.74 0.63 0.84 0.37 
Macro Average 0.46 0.74 0.49 0.84 0.32 
Weighted Average 0.45 0.74 0.53 0.84 0.29 


TABLE 5. Classification report on the basis of support 


LR RF NB DT SVM 
Micro Average 19 19 19 19 19 
Macro Average 19 19 19 19 19 
Weighted Average 19 19 19 19 19 


TABLE 6. Statistical report with ensemble 


LR RF NB DT SVM 
Cohen’s Kappa 0.020619 0.486486 0.141935 0.681564 0.140006 
ROC_AUC 0.511364 0.755682 0.562500 0.846591 0.420455 


TABLE 7. Classification report on the basis with ensemble of precision 


LR RF NB ODT 
Micro Average 0.42 0.74 0.47 0.84 
Macro Average 0.45 0.73 0.55 0.84 
Weighted Average 0.46 0.74 0.57 0.85 


TABLE 8. Classification report on the basis with ensemble of recall 


LR RF NB ODT 
Micro Average 0.42 0.74 0.47 0.84 
Macro Average 0.47 0.74 0.53 0.85 
Weighted Average 0.42 0.74 0.47 0.84 


TABLE 9. Classification Report on the basis with Ensemble of f1 Score 


LR RF NB DT 
Micro Average 0.42 0.74 0.47 0.84 
Macro Average 0.59 0.73 0.43 0.84 
Weighted Average 0.37 0.74 041 0.84 
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TABLE 10. Classification report on the basis with ensemble of support 


Micro Average 
Macro Average 
Weighted Average 


LR RF NB DT 
19 19 19 19 
19 19 19 19 
19 19 19 19 


TABLE 11. Statistical report with ensemble 


LR 
Cohen’s Kappa _ 0.060914 
ROC_AUC 


RF 


After ensemble, the RF; the result of the predic- 
tion accuracy has increased up to 0.7368 


Finally, experimented on Ensembled new model, 
resulting accuracy level is reached up to 87.0, Stan- 
dard Deviation = 0.134, Precession = 83.74, Sensi- 
tivity =78.11, F Score: 80.828, ROC_AUC = 89.25 
and Kappa Score: 66.986 


RF; a single classifier used on data to measure 
the performance shown in table-1. They did not 
investigate the plurality of the number of classi- 
fiers such as LR, SVM, DT, NB, whereas, we 
used all stated model for proper investigation in 
this paper and all respect of the above mentioned 
models, minimizing FPR; and FNR; and at the 
end, got the predicted result. A. Osarumwense, 
B. Eromosele, 2020, implemented, the popular 
machine-learning Bayesian Belief network model 
was designed on the Bayes Server platform to pre- 
dict Hemorrhagic Fever and its symptoms. In com- 
parison to others (Balasaravanan and Prakash), our 
prediction accuracy much better is shown in the 
table-1. Researchers, Dasgupta, Sharma, Sinha, 
Raghavendra, 2019, used three machine learning 
algorithms RF; DT, and SVM, on their survey data, 
and they found accuracy level much better shown 
in the below table-1. (Mello-Roman et al.), worked 
on 90% training and 10% testing data of data set 
of early detection, and diagnosis of DF. They used 
Artificial Neural Network (ANN), where multilayer 
perception (MLP), and Radial basis function (RBF) 
were applied, and SVM classifier, where three ker- 
nel function have been evaluated with the support 
of IBM; SPSS; software. Using of the ANN-MLP 
classifier, they gained 96% accuracy, 96% sensi- 
tivity and 97% specificity with low validation, and 
using of SVM — Polynomial, got result of 90% for 
accuracy level. In the research of (Salami), for pre- 


NB DT SVM 


0.469274 0.500000 0.681564 0.060914 
0.4659090 0.738636 0.528409 0.846591 


0.4659090 


detection of DF; four machine learning models (pls, 
glmset, RF; xgboost) were evaluated with testing 
data set ROC_AUC as the quantitative measure for 
performance measure shown in the given table-1. 

In the paper of (Kapoor, Kadyan, and Ahuja), 
the authors have conducted an analytical study con- 
ducted and used standard parameters, and prepared 
a dataset for a machine learning predictive model to 
detect DF, for early detection. The main target of 
that paper was four main factors which are fever, 
skin disease, headache, abdominal pain for early 
detection of dengue. 


7. Conclusion 


From the table-1, it has observed that, accuracy label 
result is more affective, 1.e., 87.0% using several 
machine learning models as stated above, and two 
most effective statistical methods Cohen’s Kappa, 
and ROC_AUC. From the analysis of data, symp- 
toms of the dengue patients before, and after detec- 
tion of DF are specifically marked. And very impor- 
tant conclusion in the paper, it has observed that, 
body parts have been damaged in the patient’s body 
after recovery from DF, This damage may be minor 
and major significant effect in the human body. 
Damage organ may be Liver, Prostate, Spleen, and 
different Spot shown on the human body, Enzyme 
system failure, the Nerve system failure in the brain 
and also from the study of data, the dengue patients 
suffers from blood sugar, weight loss, appetite and 
weakness after recovering from DF. But in our study, 
it reveals that, four factors are not sufficient to 
detect dengue; these, four factors are now the simi- 
lar symptoms to other diseases like COVID-19. The 
author has looked that, more symptoms have added 
from their battue. Eventually, it has concluded by 
the author that, except the above four factors, eye 
pain or red eye or both, hiccups, and loose stool are 
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TABLE 12. Quantitative measure for performance measure of four machine learning models (pls, 


glmset, RF; xgboost) 
In Accuracy Standard Sensitivity Specificity Precision F_Score ROC_AUC Kappa 
Paper Deviation Score 
1 - - 88.1% 94.9% - - - - 
2 92.34% - 94.04% 92.19% - - - - 
3 - - 80% 65% 0.75 
4 90% - 96% 97% - - - - 
5 95.00% - - - - - - - 
6 88% - - - - - 0.94 - 
Our 87.0% 0.13426066 78.111 - 83.741 80.828 89.25 66.986 


Study 
41050383 


also added as major common factor for early detec- 
tion of DF. 


8. Future Scope 


The proposed technique has been tested only on 
dengue classification, and it should further have 
evaluated clinical datasets. The proposed methodol- 
ogy can be tested on other applications where nature 
of data is different. 
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